1,822 research outputs found

    Real-valued feature selection for process approximation and prediction

    Get PDF
    The selection of features for classification, clustering and approximation is an important task in pattern recognition, data mining and soft computing. For real-valued features, this contribution shows how feature selection for a high number of features can be implemented using mutual in-formation. Especially, the common problem for mutual information computation of computing joint probabilities for many dimensions using only a few samples is treated by using the Rènyi mutual information of order two as computational base. For this, the Grassberger-Takens corre-lation integral is used which was developed for estimating probability densities in chaos theory. Additionally, an adaptive procedure for computing the hypercube size is introduced and for real world applications, the treatment of missing values is included. The computation procedure is accelerated by exploiting the ranking of the set of real feature values especially for the example of time series. As example, a small blackbox-glassbox example shows how the relevant features and their time lags are determined in the time series even if the input feature time series determine nonlinearly the output. A more realistic example from chemical industry shows that this enables a better ap-proximation of the input-output mapping than the best neural network approach developed for an international contest. By the computationally efficient implementation, mutual information becomes an attractive tool for feature selection even for a high number of real-valued features

    Nichtlineare Merkmalsselektion mit der generalisierten Transinformation

    Get PDF
    In the context of information theory, the term Mutual Information has first been formulated by Claude Elwood Shannon. Information theory is the consistent mathematical description of technical communication systems. To this day, it is the basis of numerous applications in modern communications engineering and yet became indispensable in this field. This work is concerned with the development of a concept for nonlinear feature selection from scalar, multivariate data on the basis of the mutual information. From the viewpoint of modelling, the successful construction of a realistic model depends highly on the quality of the employed data. In the ideal case, high quality data simply consists of the relevant features for deriving the model. In this context, it is important to possess a suitable method for measuring the degree of the, mostly nonlinear, dependencies between input- and output variables. By means of such a measure, the relevant features could be specifically selected. During the course of this work, it will become evident that the mutual information is a valuable and feasible measure for this task and hence the method of choice for practical applications. Basically and without the claim of being exhaustive, there are two possible constellations that recommend the application of feature selection. On the one hand, feature selection plays an important role, if the computability of a derived system model cannot be guaranteed, due to a multitude of available features. On the other hand, the existence of very few data points with a significant number of features also recommends the employment of feature selection. The latter constellation is closely related to the so called "Curse of Dimensionality". The actual statement behind this is the necessity to reduce the dimensionality to obtain an adequate coverage of the data space. In other word, it is important to reduce the dimensionality of the data, since the coverage of the data space exponentially decreases, for a constant number of data points, with the dimensionality of the available data. In the context of mapping between input- and output space, this goal is ideally reached by selecting only the relevant features from the available data set. The basic idea for this work has its origin in the rather practical field of automotive engineering. It was motivated by the goals of a complex research project in which the nonlinear, dynamic dependencies among a multitude of sensor signals should be identified. The final goal of such activities was to derive so called virtual sensors from identified dependencies among the installed automotive sensors. This enables the real-time computability of the required variable without the expenses of additional hardware. The prospect of doing without additional computing hardware is a strong motive force in particular in automotive engineering. In this context, the major problem was to find a feasible method to capture the linear- as well as the nonlinear dependencies. As mentioned before, the goal of this work is the development of a flexibly applicable system for nonlinear feature selection. The important point here is to guarantee the practicable computability of the developed method even for high dimensional data spaces, which are rather realistic in technical environments. The employed measure for the feature selection process is based on the sophisticated concept of mutual information. The property of the mutual information, regarding its high sensitivity and specificity to linear- and nonlinear statistical dependencies, makes it the method of choice for the development of a highly flexible, nonlinear feature selection framework. In addition to the mere selection of relevant features, the developed framework is also applicable for the nonlinear analysis of the temporal influences of the selected features. Hence, a subsequent dynamic modelling can be performed more efficiently, since the proposed feature selection algorithm additionally provides information about the temporal dependencies between input- and output variables. In contrast to feature extraction techniques, the developed feature selection algorithm in this work has another considerable advantage. In the case of cost intensive measurements, the variables with the highest information content can be selected in a prior feasibility study. Hence, the developed method can also be employed to avoid redundance in the acquired data and thus prevent for additional costs.Der Begriff der Transinformation wurde erstmals von Claude Elwood Shannon im Kontext der Informationstheorie, einer einheitlichen mathematischen Beschreibung technischer Kommunikationssysteme, geprägt. Die vorliegenden Arbeit befaßt sich vor diesem Hintergrund mit der Entwicklung einer in der Praxis anwendbaren Methodik zur nichtlinearen Merkmalselektion quantitativer, multivariater Daten auf der Basis des bereits erwähnten informationstheoretischen Ansatzes der Transinformation. Der Erfolg beim Übergang von realen Meßdaten zu einer geeigneten Modellbeschreibung wird maßgeblich von der Qualität der verwendeten Datenmengen bestimmt. Eine qualitativ hochwertige Datenmenge besteht im Idealfall ausschließlich aus den für eine erfolgreiche Modellformulierung relevanten Daten. In diesem Kontext stellt sich daher sofort die Frage nach der Existenz eines geeigneten Maßes, um den Grad des, im Allgemeinen nichtlinearen, funktionalen Zusammenhangs zwischen Ein- und Ausgaben quantitativ korrekt erfassen zu können. Mit Hilfe einer solchen Größe können die relevanten Merkmale gezielt ausgewählt und somit von den redundanten Merkmalen getrennt werden. Im Verlaufe dieser Arbeit wird deutlich werden, daß die eingangs erwähnte Transinformation ein hierfür geeignetes Maß darstellt und im praktischen Einsatz bestens bestehen kann. Die ursprüngliche Motivation zur Erstellung der vorliegenden Arbeit hat ihren durchaus praktischen Hintergrund in der Automobiltechnik. Sie entstand im Rahmen eines komplexen Forschungsprojektes zur Ermittlung von nichtlinearen, dynamischen Zusammenhängen zwischen einer Vielzahl von meßtechnisch ermittelten Sensorsignalen. Das Ziel dieser Aktivitäten war, durch die Identifikation von nichtlinearen, dynamischen Zusammenhängen zwischen den im Automobil verbauten Sensoren, sog. virtuelle Sensoren abzuleiten. Die konkrete Aufgabenstellung bestand nun darin, die Bestimmung einer zentralen Motorgröße so effizient zu gestalten, daß diese ohne zusätzliche Hardware unter harten Echtzeitvorgaben berechenbar ist. Auf den zusätzlichen Einsatz von Hardware verzichten zu können und mit der bereits vorhandenen Rechenleistung auszukommen, stellt aufgrund des resultierenden, enormen Kostenaufwandes insbesondere in der Automobiltechnik eine unglaublich starke Motivation dar. In diesem Zusammenhang trat immer wieder die große Problematik zutage, eine praktisch berechenbare Methode zu finden, die sowohl lineare- als auch nichtlineare Zusammenhänge zuverlässig quantitativ erfassen kann. Im Verlauf der Arbeit werden nun unterschiedliche Selektionsstrategien mit der Transinformation kombiniert und deren Eigenschaften miteinander verglichen. In diesem Zusammenhang erweist sich die Kombination von Transinformation mit der sogenannten Forward Selection Strategie als besonders interessant. Es wird gezeigt, daß diese Kombination die praktische Berechenbarkeit für hochdimensionale Datenräume, im Vergleich zu anderen Vorgehensweisen, tatsächlich erst ermöglicht. Im Anschluß daran wird die Konvergenz dieses neuen Verfahrens zur Merkmalselektion bewiesen. Wir werden weiterhin sehen, daß die erzielten Ergebnisse bemerkenswert nahe an der optimalen Lösung liegen und im Vergleich mit einer alternativen Selektionsstrategie deutlich überlegen sind. Parallel zur eigentlichen Selektion der relevanten Merkmale ist es mit der in dieser Arbeit entwickelten Methode nun auch problemlos möglich, eine nichtlineare Analyse der zeitlichen Abhängigkeiten von ausgewählten Merkmalen durchzuführen. Eine anschließende dynamische Modellierung kann somit wesentlich effizienter durchgeführt werden, da die entwickelte Merkmalselektion zusätzliche Information hinsichtlich des dynamischen Zusammenhangs von Eingangs- und Ausgangsdaten liefert. Mit der in dieser Arbeit entwickelten Methode ist nun letztendlich gelungen was vorher nicht möglich war. Das quantitative Erfassen der nichtlinearen Zusammenhänge zwischen dedizierten Sensorsignalen, um diese in eine effiziente Merkmalselektion einfließen zu lassen. Im Gegensatz zur Merkmalsextraktion, hat die in diese Arbeit entwickelte Methode der nichtlinearen Merkmalselektion einen weiteren entscheidenden Vorteil. Insbesondere bei sehr kostenintensiven Messungen können diejenigen Variablen ausgewählt werden, die hinsichtlich der Abbildung auf eine Ausgangsgröße den höchsten Informationsgehalt tragen. Neben dem rein technischen Aspekt, die Selektionsentscheidung direkt auf den Informationsgehalt der verfügbaren Daten zu stützen, kann die entwickelte Methode ebenfalls im Vorfeld kostenrelevanter Entscheidungen herangezogen werden, um Redundanz und die damit verbundenen höheren Kosten gezielt zu vermeiden

    Economic and legal aspects of international environmental agreements: The case of enforcing and stabilising an international CO 2 agreement

    Get PDF
    The protection of the global environment is impeded by multilateral externalities which the international community attempts to bring under control by entering into international agreements. International agreements, however, can suffer from non-compliance and free-riding behaviour by sovereign states and must therefore be enforced and stabilised internationally. This paper describes instruments for the enforcement and stabilisation of an international CO2 agreement and evaluates them in the light of economic and legal theory. Economic instruments build on repetition and use utility transfers, economic sanctions and flexible treaty adjustments. Important legal instruments are reciprocal obligations and cooperation duties, international funding and transfer rules, treaty suspension, retorsions and reprisals, treaty revision, and monitoring. The paper shows that economic and legal instruments are compatible to a considerable extent. It develops proposals for the enforcement and stabilisation of a global CO2 agreement and other multilateral treaties.International environmental agreements,international cooperation,non-compliance,enforcement,global warming,international law

    “\u3cb\u3ePICO\u3c/b\u3e”: \u3cb\u3eP\u3c/b\u3eractice EBM skills, \u3cb\u3eI\u3c/b\u3encrease student interests with \u3cb\u3eC\u3c/b\u3eollaboration of librarians and improve \u3cb\u3eO\u3c/b\u3eutcomes

    Get PDF
    Available literature on teaching evidence-based medicine (EBM) to medical students focuses on teaching critical appraisal skills, often in the context of a journal club, workshops or lectures. Being able to utilize EBM effectively means that a learner is able to take a clinical scenario, develop a clinically relevant question, search for the evidence, appraise that evidence, and apply the results of this appraisal back to the individual patient. Hence EBM activity is more likely to become a part of clinical decision-making if medical students practice the skills in the context of direct patient care

    Fallstudie F. Hoffmann-La Roche : Prozess-Simulation im Global Clinical Trial Supply

    Get PDF
    Den Bereitstellungsprozess in globalen klinischen Studien optimal zu steuern, ist eine anspruchsvolle Aufgabe mit herausfordernden Zielkonflikten. Auf der einen Seite gilt es, das Risiko von «Stock-outs» (Fehlmengen) zu reduzieren, auf der anderen Seite will man, Mehrkosten durch eine Überproduktion an Studienmedikation («Overage») vermeiden. Die vorliegende Fallstudie demonstriert, wie das Simulation-Team bei F. Hoffmann LaRoche Global Clinical Demand and Supply Management durch die Vorhersage optimaler Produktions- und Liefermengen wertvolle Entscheidungsgrundlagen liefert. Dabei geht der Einsatz von Simulation über die operative Prozesssteuerung hinaus und liefert Impulse für die kontinuierliche Verbesserung der Prozesskette, indem der Bereitstellungsprozess auf der Grundlage der Simulationsergebnisse fortlaufend optimiert und standardisiert wird

    Evidence-Based Practice for Medical Students in a Family Medicine Clerkship: Collaborative, Active Learning for Clinical Decision Skills

    Get PDF
    Objectives: This active learning experience was designed to enhance the information literacy knowledge and skills of medical students for patient-centered, evidence-based decisions at the point of care. It includes formulating clinical questions using patient/problem, intervention, comparison, outcome (PICO), accessing the highest level of evidence-based medicine (EBM) information available in an effective manner, and evaluating the information in relation to a specific patient in an outpatient setting. Methods: Third-year medical students participate in a small-group collaborative, patient-centered learning experience during the family medicine clerkship, coordinated by the clerkship directors with participation by two medical librarians. At orientation, the clerkship directors provide the students with an overview of the evidence-based process and creating PICO questions. Librarians then direct a hands-on instruction session covering evidence-based resources and search strategies for finding point-of-care EBM information. Students select a clinical question from a patient encounter in their outpatient clinics. Each student submits a worksheet providing the PICO question, resources consulted, search strategy, selected bibliographic references, and clinical recommendations for their patient. Librarians provide a written assessment and suggestions for improvement relative to the students\u27 search strategies and resource selections. Students then present their patient clinical question, research, and recommendations to the clinical faculty and student group. Results: In the most recent 6 months of this course, 85% of the 55 students participating were rated as “competent” in the areas of resource selection and literature searching on their EBM assignment. Pre- and post-tests results indicate that a majority of the students had an increased familiarity with and appreciation of key evidence-based medicine resources such as Cochrane Reviews, ACP PIER, and FPIN after completing the EBM assignment. Student evaluations reflect increased interest and value in EBM through this experience. Conclusion: Providing an active learning, patient-centered experience with collaboration between clinical faculty and medical librarians has been successful in improving third-year medical student knowledge and skills in medical information literacy for clinical decision making. The project has also provided useful data for ongoing discussions with the college of medicine regarding increasing the longitudinal role of the library throughout the curriculum

    b -> s gamma in the left-right supersymmetric model

    Full text link
    The rare decay bsγb \to s \gamma is studied in the left-right supersymmetric model. We give explicit expressions for all the amplitudes associated with the supersymmetric contributions coming from gluinos, charginos and neutralinos in the model to one-loop level. The branching ratio is enhanced significantly compared to the standard model and minimal supersymmetric standard model values by contributions from the right-handed gaugino and squark sector. We give numerical results coming from the leading order contributions. If the only source of flavor violation comes from the CKM matrix, we constrain the scalar fermion-gaugino sector. If intergenerational mixings are allowed in the squark mass matrix, we constrain such supersymmetric sources of flavor violation. The decay bsγb \to s \gamma sets constraints on the parameters of the model and provides distinguishing signs from other supersymmetric scenarios.Comment: 12 figure

    A Profile Likelihood Analysis of the Constrained MSSM with Genetic Algorithms

    Full text link
    The Constrained Minimal Supersymmetric Standard Model (CMSSM) is one of the simplest and most widely-studied supersymmetric extensions to the standard model of particle physics. Nevertheless, current data do not sufficiently constrain the model parameters in a way completely independent of priors, statistical measures and scanning techniques. We present a new technique for scanning supersymmetric parameter spaces, optimised for frequentist profile likelihood analyses and based on Genetic Algorithms. We apply this technique to the CMSSM, taking into account existing collider and cosmological data in our global fit. We compare our method to the MultiNest algorithm, an efficient Bayesian technique, paying particular attention to the best-fit points and implications for particle masses at the LHC and dark matter searches. Our global best-fit point lies in the focus point region. We find many high-likelihood points in both the stau co-annihilation and focus point regions, including a previously neglected section of the co-annihilation region at large m_0. We show that there are many high-likelihood points in the CMSSM parameter space commonly missed by existing scanning techniques, especially at high masses. This has a significant influence on the derived confidence regions for parameters and observables, and can dramatically change the entire statistical inference of such scans.Comment: 47 pages, 8 figures; Fig. 8, Table 7 and more discussions added to Sec. 3.4.2 in response to referee's comments; accepted for publication in JHE
    corecore